Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets
نویسنده
چکیده
The existence of many large transactions distributed databases with high data schemas, the centralized approach for mining association rules in such databases will not be feasible. Some distributed algorithms have been developed [FDM, CD], but none of them have considered the problem of data skews in distributed mining of association rules. The skewness of datasets reduces the workload balancing between processors involved in distributed mining of association rules. It is important to invent an efficient approach for distributed mining of association rules which have the ability to generate homogeneous partitions of the whole data sets; hence the supports of most large item sets are distributed evenly across the processors. We proposed an efficient stratified sampling based partitioned technique, which generate homogeneous partitions on which processors works in parallel and generate their local concepts approximately simultaneously.
منابع مشابه
Parallel Implementation of Apriori Algorithm
Association rule mining concept is used to show relation between items in a set of items. Apriori algorithm for mining frequent itemsets from large amount of database is used. Parallelism is used to reduce time and increase performance, Multi-core processor is used for parallelization. Mining in a Serial manner can consume time and reduce performance for mining. To solve this issue we are propo...
متن کاملAn Improved Technique Of Extracting Frequent Itemsets From Massive Data Using MapReduce
The mining of frequent itemsets is a basic and essential work in many data mining applications. Frequent itemsets extraction with frequent pattern and rules boosts the applications like Association rule mining, co-relations also in product sale and marketing. In extraction process of frequent itemsets there are number of algorithms used Like FP-growth,E-clat etc. But unfortunately these algorit...
متن کاملExploiting Parallelism in Association Rule Mining Algorithms
Association rule mining is one of the major technique of data mining, involves finding of frequent itemsets with minimum support and generating association rule among them with minimum confidence. The task of finding all frequent itemsets for a large datasets requires a lot of computation which can be minimized by exploiting parallelism to the sequential algorithms. In this paper, we provide th...
متن کاملLPAS: High Efficiency Load Balancing Parallel Data Mining Algorithm
Association rule discovery plays an important role in knowledge discovery and data mining, and efficiency is especially crucial for an algorithm finding frequent itemsets from a large database. Many methods have been proposed to solve this problem. In addition, parallel computing has been a popular trend, such as on cloud platform, grid system or multicore platform. In this paper, a high effici...
متن کاملStatic Load Balancing of Parallel Mining of Frequent Itemsets Using Reservoir Sampling
In this paper, we present a novel method for parallelization of an arbitrary depth-first search (DFS in short) algorithm for mining of all FIs. The method is based on the so called reservoir sampling algorithm. The reservoir sampling algorithm in combination with an arbitrary DFS mining algorithm executed on a database sample takes an uniformly but not independently distributed sample of all FI...
متن کامل